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Abstract 

A review of Bayesian restoration of digital images based on Monte Carlo tech- 
niques is presented. The topics covered include Likelihood, Prior and Posterior 
distributions, Poisson, Binary symmetric channel and Gaussian channel mod- 
els of Likelihood distribution, Ising and Potts spin models of Prior distribution, 
restoration of an image through Posterior maximization, statistical estimation of 
a true image from Posterior ensembles, Markov Chain Monte Carlo methods and 
cluster algorithms. 



Introduction 

Noise, in a digital image, is an inevitable nuisance; often it defeats the purpose, the 
image was taken for. Noise mars the scene it intends to depict and affects the artist in 
the person looking at the image; it limits the diagnosis a doctor makes from a medical 
image; it interferes with the inferences a physicist draws from the images acquired by 
his sophisticated spectrometers and microscopes; it restricts useful information that 
can be extracted from the images transmitted by the satellites on earth's resources and 
weather; etc. Hence it is important that noise be eliminated and true image restored. 

Noise, in the first place, enters a digital image while being acquired due to, for 
example faulty apparatus and poor statistics; it enters while storage due to aging, 
defects in storage devices etc.; it enters during transmission through noisy channels. 
We shall not be concerned, in this review, with specific details of mechanisms of noise 
entry and image degradation. Instead, we confine our attention mostly to de-noising, 
also called restoration, of digital images. 

Conventional digital image processing techniques rely on the availability of qualita- 
tive and quantitative information about the specific degradation process that leads to 
noise in an image. These algorithms work efficiently on applications they are intended 
for; perhaps they would also work reasonably well on closely related applications. But 
they are unreliable for applications falling outside^ their domain. There are not many 

^ Often the mechanism of image degradation is not known reasonably well. In several situations, 
even if the mechanism is known, it is complex and not amenable to easy modeling. It is in these 
contexts that general purpose image restoration algorithms become useful. 
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image restoration algorithms that can handle a wide variety of images. Linear filters 
like low pass filters, high pass filters etc., and non-linear filters like median filters, con- 
strained least mean square filters etc., are a few familiar examples belonging to the 
category of general purpose algorithms. 

An important aim of image restoration research is to devise robust algorithms based 
on simple and easily implementable ideas and which can cater to a wide spectrum of 
applications. It is precisely in this context, a study of the statistical mechanics of images 
and image processing should prove useful. The reasons are many. 

- Statistical mechanics aims to describe macroscopic properties of a material from 
those of its microscopic constituents and their interactions; an image is a macro- 
scopic object and its microscopic constituents are the gray levels in the pixels of 
the image frame. 

- Statistical mechanics is based on a few general and simple principles and can be 
easily adapted to image processing studies. 

- Bayesian statistics is closely related to statistical mechanics as well as to the foun- 
dations of stochastic image processing in particular and information processing 
in general. 

- Several statistical mechanical models have a lot in common with image processing 
models. 

But why should we talk of these issues now ? Notice, images have entered our 
personal and professional life in a big way in the recent times - personal computers, 
INTERNET, digital cameras, mobile phones with image acquisition and transmission 
capabilities, new and sophisticated medical and industrial imaging devices, spectrome- 
ters, microscopes, remote sensing devises etc.. 

Let me briefly touch upon a few key issues of image processing to drive home the 
connection between statistical mechanics and image processing. Mathematically, an 
image is a matrix of non-negative numbers, representing the gray levels or intensities in 
the pixels of an image plane. A pixel is a tiny square on the image plane and represents 
an element of the picture. Pixels coarse-grain the image plane. The gray level in a 
pixel defines the state of the pixel. Pixels are analogous to the vertices of a lattice that 
coarse-grain space in a statistical mechanical two dimensional model. The gray levels 
are analogous to the states of spins or of atoms sitting on the lattice sites. Spatial corre- 
lations in an image can be modeled by defining suitable interaction between states of two 
pixels, exactly like the spin-spin interaction on a lattice model. In fact one can define 
an energy for an image by summing over these interactions. An exponential Bayesian 
Prior is analogous to Gibbs distribution. It is indeed this correspondence which ren- 
ders an image, a Markov random field and vice versa. We can define a KuUback-Leibler 
entropy distance for constructing a Likelihood probability distribution of images. Tem- 
perature, in the context of image processing, represents a parameter that determines 
the smoothness of an image. The energy-entropy competition, responsible for differ- 
ent phases of matter, is like the Prior - Likelihood competition that determines the 
smoothness and feature-quality of an image in Bayesian Posterior maximization algo- 
rithms. Simulated annealing that helps equilibration in statistical mechanical models 
has been employed in image processing by way of starting the restoration process at 
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high temperature and gradually lowering the temperature to improve noise reduction 
capabilities of the chosen algorithm. Mean-field approximations of statistical mechan- 
ics have been employed for hyper-parameters estimation in image processing. In fact 
several models and methods in statistical mechanics have their counter parts in image 
processing. These include Ising and Potts spin models, random spin models, Bethe 
approximation. Replica methods, renormalization techniques, methods based on bond 
percolation clusters and mean field approximations. We can add further to the list of 
common elements of image processing and statistical mechanics. Suffice is to say that 
these two seemingly disparate disciplines have a lot to learn from each other. It is 
the purpose of this review to elaborate on these analogies and describe several image 
restoration algorithms inspired by statistical mechanics. 

Image processing is a vast field of research with its own idioms and sophistication. 
It would indeed be impossible to address all the issues of this subject in a single review; 
nor are we competent to undertake such a task. Hence in this review we shall restrict 
attention to a very limited field of image processing namely the Bayesian de-noising 
of images employing Markov chain Monte Carlo methods. We have made a conscious 
effort to keep the level of discussions as simple as possible. 

Stochastic image processing based on Bayesian methodology was pioneered by Derin, 
Eliot, Cristi and Gemen [1], Gemen and Gemen [2] and Besag [3]. For an early review 
see Winkler [4] ; for a more recent review on the statistical mechanical approach to image 
processing see Tanaka [5]. Application of analytical methods of statistical mechanics 
to image restoration process can be found in [6]. The spin glass theory of statistical 
mechanics has been applied to information processing [7]. Infinite range Ising spin 
models have been applied to image restoration employing replica methods [8]. A brief 
account of the Bayesian restoration of digital images employing Markov chain Monte 
Carlo methods was presented in [9] . The present review, is essentially based on [9] with 
more details and examples. The review is organized as follows. 

We start with a mathematical representation and a stochastic description of an 
image. We discuss a Poisson model for image degradation to construct a Likelihood 
distribution. We show how KuUback-Leibler distance emerges naturally in the context 
of Poisson distributions. Then we show how to quantify the smoothness of an image 
through an energy function. This is followed by a discussion of Ising and Potts spin 
Hamiltonians often employed in modeling of magnetic transitions. We define a Bayesian 
Prior as a Gibbs distribution. Such a description renders an image a Markov random 
field. Bayes' theorem combines a subjective Prior with a Likelihood model incorporating 
the data on the corrupt image and gives a Posterior - a conditional distribution from 
which we can infer the true image. We show how the Likelihood and the Prior compete 
to increase the Posterior, the competition being tuned by temperature. The Prior tries 
to smoothen out the entire image with the Likelihood competing to preserve all the 
features, genuine as well as noisy, of the given image. For a properly tuned temperature 
the Likelihood loses on noise but manages to win in preserving image features; similarly 
the Prior loses on smoothening out the genuine features of the image but wins on 
smoothening out the noise inhomogeneities. The net result of the competition is that we 
get a smooth image with all genuine inhomogeneities in tact. The image that maximizes 
the Posterior for a given temperature provides a good estimate of the true image. We 
describe a simple Monte Carlo algorithm to search for the Posterior maximum and 
present a few results of processing of noisy toy and benchmark images. 
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Then we take up a detailed discussion of Markov Chain Monte Carlo methods based 
on Metropolis and Gibbs sampler; we show how to estimate the true image through 
Maximum a Posteriori (MAP), Maximum Posterior Marginal (MPM) and Threshold 
Posterior Mean (TPM) statistics that can be extracted from an equilibrium ensemble 
populating the asymptotic segment of a Markov chain of images. We present results 
of processing of toy and benchmark images employing the algorithms described in the 
review. Finally we take up the issue of sampling from a Prior distribution employing 
bond percolation clusters. The cluster algorithms can be adapted to image processing 
by sampling cluster gray level independently from the Likelihood distribution. Cluster 
algorithms ensure faster convergence to Posterior ensemble from which the true image 
can be inferred. Wc employ a single cluster growth algorithm to process toy and bench- 
mark images and discuss results of the exercise. We conclude with a brief summary 
and a discussion of possible areas of research in this exciting field of image processing, 
from a statistical mechanics point of view. 

Mathematical Description of an Image 

Consider a region of a plane discretized into tiny squares called picture elements or 
pixels for short. Let <S be a finite set of pixels. For convenience of notation we identify 
the spatial location of a pixel in the image plane by a single index i. Also we say that 
index i denotes a pixel. Mathematically, a collection Q = {6i : i G 5} of non-negative 
numbers is called an image. Let < oo denote the cardinality of the set S. In other 
words N is the number of pixels in the image plane. 9i denotes the intensity or gray 
level in the pixel i E S. The state of a pixel is defined by its gray level. Let us denote 
by © = {9i : i G S}, the true image. is not known to us. Instead we have a noisy 
image X = {xi : i G 5}. The aim is to restore the true image 6 from the given noisy 
image X. 

Stochastic Description of an Image 

In a stochastic description, we consider an image G as a collection of random vari- 
ables {9i : i G 5}^. Consider for example, A^ pixels painted with Q gray levels, 
{0, 1, • • • , Q — 1}. The gray level Label denotes black and the gray level label Q — 1 
denotes white. Each pixel can take any one of the Q gray levels. Collect all the possible 
images you can paint, in a set, denoted by the symbol Q, called the state space, fl is 
analogous to the coarse grained phase space of a classical statistical mechanical system. 
Let Q denote the total number of images in the state space. It is clear that Q = Q^. 
Both © and X belong to Q. If all the images belonging to O, are equally probable then 
we can resort to a uniform ensemble description. 

Figure (1 : Left) depicts a 10 x 10 image painted randomly with 5 gray levels labeled 
by integers from to 4, with the label denoting black and the label 4 denoting white. 
The 10 X 10 matrix of non-negative numbers (the gray level labels) that provides a 
mathematical representation of the image is also shown in Fig. (1 : Right). In image 

^Equivalently, we can consider an image as a particular realization of the set of random variables. 
For convenience of notation we use the same symbol 9i to denote the random variable as well as a 
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Figure 1: (Left) A 10 x 10 random image painted with five gray levels with 
representing black and 4 representing the white. (Right) The matrix of non- 
negative numbers (gray levels) representing the image. 

processing terminology, is called a random field^. Thus the noisy image X on hand 
is a random field. The noisy image X — {xj : i e 5} has 

- a systematic component representing the features of ©, that has survived image 
degradation 

and 

- a stochastic component that accounts for the noise that enters during degra- 
dation ^. 

A stochastic model for X is constructed by defining a conditional probability of X|0 
{X given 0). This conditional probability is denoted by £(X|0) and is called the 
Likelihood piohahilitj distribution, or simply the Likelihood. C{X\Q) can be suitably 
modeled to describe the stochastic degradation process that produced X from 0. 

Let us consider a simple and general model of image degradation based on Poisson 
statistics, which will render transparent the idea and use of likelihood distribution. 

Poisson Likelihood Distribution 

We consider the random field X as constituting a collection of independent random 
variable {xj : i G S}. We take each .Xj to be a Poisson random variable. The 
motivation is clear. Image acquisition is simply a counting process^. Poisson process is 

realization of the random variable. The distinction would be clear from the context. 

^We shall see later that in a stochastic description, an image can be modeled as a Markov random 
field. 

^the degradation of an image can occur during acquisition, storage or transmission through noisy 
channels. 

^We count the photons incident on a photographic or X-ray film; we count the electrons while 
recording diffraction pattern; etc. 
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a simple and a natural model for describing counting. There is only one parameter in 
a Poisson distribution, namely the mean. We take (xj) = 9i. The Likelihood of Xi\9i is 
given by, 



jC{xi\ei) = —{o^] eM-Oi) 

Xi 



(1) 



We assume {xi : i G 5} to be independent random variables. The joint distribution 
is then the product of the distributions of the individual random variables. Hence we 
can write the Likelihood distribution of X|6 as. 



cix\e) = HcixS) 



n 

ies 



Xi 



(2) 



The above Poisson Likelihood can be conveniently expressed as. 



where. 



£{X\e) = exp 



F(e,x) 



ies 



(3) 



9i - Xi \og{9i) 



(4) 



The advantage of Poisson and related likehhood distributions is that there is no hyper- 
parameter that needs to be optimized^. 

KuUback-Leibler Entropy Distance 

It is clear from the discussion on Poisson Likelihood that the function F(©, X) serves 
to quantify how far away is an image 9 from another image X. Let us investigate as 
to what extent does the function F{Q,X) meet the requirements of a distance metric. 

F(0, X) ^ when Q = X and it is not symmetric in its arguments. Let us write 
F{e,X) as. 



(5) 



ie5 



and study the behaviour of f{9i,Xi) as a function of 9i with a known value of Xj. We 
have, 

f{9i]Xi) = 9i- Xi\og{9i), 



dl 
d9i 



1 - 



Xi 



(6) 



^The Gaussian channel Likelihood has two hyper-parameters. The binary symmetric channel Like- 
lihood has one hyper-parameter. We shall discuss these types of degradation processes later. 
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It is clear from the above that, 



df 

— = 



(7) 



when 6i = Xi. In other words at 9i = Xi, the function f{6i;Xi) is an extremum. To 
determine whether it is a minimum or a maximum we consider the second derivative of 
f{9i; Xi) at 9i = Xi. We have, 



dOf 



Oi — Xo 



1 



> 0, 



(8) 



implying that f{Oi]Xi) is minimum at 9i = Xi. Let fm = Xi — Xilog{xi) denote the 
minimum value of f{Oi; Xi). We define a new function, 



g(,Oi,Xi) 



= {6i-x,) + ie,-Xi)\og{9i/x,). (9) 
The function g{6i, Xi) is not sjTumetric in its arguments. Hence we define, 

Fi{ei,Xi) = g{9i,Xi) + g{9i ^ Xi,Xi ^ 9i) 



= {9,-x,)\og{9,/xi) (10) 

We can now define a new distance between and X as 

F(e, X) = - X,) log(^,/x,) (11) 

ies 

We see that that the new distance function F{Q,X) = when Q = X and is positive 
definite when 7^ X; also F is symmetric in its arguments. We shall take F{Q,X) 
as a measure of the distance between the images G and X F{Q,X) is the same as 
the symmetric Kullback-Leibler entropy [10]. This is also called by various names like 
Kullback-Leibler distance, Kullback-Leibler divergence, mutual information, relative 
entropy etc., defined for quantifying the separation between two probability distribu- 
tions. We refer to F(6,X) given by Eq. (fTTj) as Kullback-Leibler distance. 

Hamming Distance 

Another useful function that quantifies the separation between two images G and X, 
defined on a common image plane, is the Hamming distance, defined as, 

F{Q,X) = J2m7^x^), (12) 

^strictly F{<d,X), given by Eq. lfTT)l is not a distance metric because it does not obey triangular 
inequality. 
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where the indicator function, 

{1 if the statement 77 is true, 
(13) 
if the statement rj is not true. 

Hamming distance simply counts the number of pixels in having gray levels different 
from those in the corresponding pixels in X. The Hamming distance F{Q,X) defined 
above is symmetric in its arguments; also F(Q,X) — 0, when Q — X. We shall find 
use for Hamming distance while processing binary images or even for processing images 
which have a few number of gray levels. However if the number of gray levels in an 
image is large, then the KuUback-Leibler distance should prove more useful. 

Gaussian Channel Likelihood Distribution 

A popular model for image degradation is based on assuming Xi to be Gaussian random 
variable with 

(xi) = a 9i, 

{x])-{x,f = a\ (14) 

In the above a and a are called the hyper-parameters of the Gaussian channel model. 
The Likelihood of xi\Qi is given by. 



L{xi\^^ = — \= exp 




(15) 



We assume {xj : i G 5} to be independent random variables. The Likelihood in the 
Gaussian channel model is given by. 



£(X|e) = _J-__exp 



where N is the total number of pixels in the image plane. 



«e5 



(16) 



Binary Symmetric Channel Likelihood Distribution 



Consider a binary image G which degrades to X by the following algorithm. Take a 
pixel i e <S in the image plane of ©. Call a random number ^ uniformly distributed in 
the unit interval. If ^ < p set Xi to the other gray level. Otherwise set Xi — Qi. This 
process is described by the Likelihood distribution given by. 



C{xi\Q,) = vT{xi^Q,)^{\-v)X{xi = Qi) 
It is convenient to express the above Likelihood as. 



L{xi\Qi) 



exp 



1 + exp(-/3i) 



(17) 



(18) 
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The hyper-parameter Pl is related to p as given below. 

exp(-/3L) 



1 + exp(-/3z,) 
& = log(i-l) 



(19) 



The procedure described above is repeated independently on all the pixels. The Like- 
lihood oi X\Q is then the product of the Likelihoods of Xi\9i and is given by, 



c{x\e) 



exp 




-PLFix,e) 




Exexp 


-PlF{X,Q) 



(20) 



where F{X,Q) is the Hamming distance between X and ©. The hyper-parameter 
(5l — 1/Tl can be interpreted as inverse temperature of the degradation process. When 
Ti = we have p = 0. In other words at Tj: = no degradation takes place and 
X = Q. As Tl increases the noise level increases. When Ti = oo. wc have p = 0.5 
and the image X i becomes completely random. In other words each pixel can have 
independently a black or white gray level with equal probability. 



Degradation of a Multi-Gray Level Image 



Consider an image painted with Q gray levels with labels 0, 1, 2, • ■ ■ , Q — 1. The 
label denotes black and the label Q — 1 denotes white. Let us say that the degradation 
of to X proceeds as follows. Take a pixel i E S. Call a random number ^ uniformly 
distributed in the range to 1. If ^ < p, then select one of the Q — I gray levels 
(excluding 9i) randomly and with equal probability and assign it to Xj. Otherwise set 
Xi — 9i. This process of degradation is described by the Likelihood given by, 



£{xi\ei) = p x{x, e,) + {1 - p) i{xi = Oi) 

It is convenient to express the above Likelihood as, 

oxp [ - 8a(x, ^ % 

^^"^^'^^^ = l + (Q-l)exp(-/3^)- 
The hyper-parameter (5l is related to p as given below, 

exp(-/3i) 



(21) 



(22) 



P 



1 + (g - 1) exp(-/3L) 



log (g-i)(^-i) 
p 



(23) 



The degradation process described above is repeated independently on all the pixels. 
Then the Likelihood of X|0 is the product of the Likelihoods of Xi\9i and is given by. 



c{x\e) 



exp 




-PLF{x,e) 




Ex exp 


' -PlF{X,Q) 



(24) 
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where F(X,Q) is the Hamming distance between X and 0. The hyper-parameter 
/3l = 1/Ti can be interpreted as inverse temperature of the degradation process. When 
= we have p = indicating no degradation and X = 0. As increases the noise 
level increases. When T^, = oo, we have p — 1 — {1/Q) which tends to unity when 
Q — > oo. In all the above if substitute Q — 2we recover the results of binary symmetric 
channel degradation. 



Three Ingredients of Bayesian Methodology 

In a typical image processing algorithm, we start with an initial image 0o. A good 
choice is 0o = X, the given noisy image. We generate a chain of images 

00 ^ 01 ^ 02 >Qn^ ©n+1 ^ (25) 

by an algorithm and hope that a suitably defined statistics over the ensemble of images 
in the chain would approximate well the original image 0. In constructing a chain of 
images we shall employ Bayesian methodology. 

The first ingredient of a Bayesian methodology is the Likelihood which models the 
degradation of to X. We have already seen a few models of image degradation. The 
second ingredient is an a priori distribution, simply called a Prior, denoted by 7r(0). The 
Prior is a distribution of gray levels on S. The choice of a Prior is somewhat subjective. 
The third ingredient is an a posteriori distribution or simply called Posterior, denoted 
by the symbol 7r(0|X). Posterior is a conditional distribution; it is the distribution of 
Q\X. 



Bayesian a Priori Distribution 

As we said earlier, the choice of a Prior in Bayesian methodology is subjective. The 
Prior should refiect what we believe a clean image should look like. We expect an image 
to be smooth. We say that a pixel is smoothly connected to its neighbour if the gray 
levels in them are the same. Smaller the difference between the gray levels, smoother 
is the connection. This subjective belief is encoded in a Prior as follows. 

We introduce an interaction energy between the states of two pixels. We confine 
the interaction to nearest neighbour pixels. Thus the states of a pair i,j of nearest 
neighbour pixels interact with an energy denoted by E^j. We take the interactions to 
be pair-wise additive and define an energy of the image as, 

E(0) = Y.^^m■ (26) 

{id) 

In the above, the symbol (i, j) denotes that the pixels i and j are nearest neighbours 
and the sum is taken over all distinct pairs of nearest neighbour pixels in the image 
plane. We model E^j in such a way that it is small when the gray levels are close to 
each other and large when the gray levels differ by a large amount. Thus, a smooth 
image has less energy. We define the Prior as. 
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where (3p = 1/Tp and Tp is a smoothening parameter called the Prior temperature. 
Z{j3p) is the normalization constant and is given by, 



Z(Dp) 



E 



exp 



- (3pE{Q) 



(28) 



e 



The Prior, as defined above, is called the Gibbs distribution. Such a definition of Prior 
implies that © is a Markov random field. Conversely if © is a Markov random field, 
then its Prior can be modeled as a Gibbs distribution. 



Markov Random Field 

A Markov Random field generalizes the notion of a discrete time Markov process, or also 
called a Markov chain. Appendix (2) briefiy describes a general as well as a time homo- 
geneous or equilibrium Markov chain. A sequence of random variables, parametrized by 
time and obeying Markovian dependence, constitutes a Markov chain. Instead of time 
we can use spatial coordinate as parameter to describe the chain, in one dimensional 
setting. The concept of Markov random field extends Markovian dependence from one 
dimension to a general setting [11] 

In a Markov random field ©, the state of a pixel is at best dependent on the state 
of the pixels in its neighbourhood^. Accordingly, let i e 5 be a pixel and let Vi denote 
the set of pixels that form the neighbourhood of i. Let A/'={z/j : i E S} denote a 
neighbourhood system for S. In other words, N is any collection of subsets of S for 
which, 

1) i^Ui 

2) i e Vj <S=^ j e Vi. 

The pair {S,J\f} is a graph. An image © is a Markov random field if 

T^{0,\{9j : j^i;jeS}^=n{e,\{ej : j G i/,}) V i G 5. (29) 
Prom the definition of conditional probability in terms of joint probability, we have, 

7r(0) = 7r(^^i,^2,--- ,^^7v) 

= n(e,\{e,:j^i;jeS}y{9j:j^v,jeS) ^ ie S (30) 
Since © is a Markov random field, the above can be written as, 

7r(0) = n(^9,\{9^ : j G z/,})7r(^^, : j j E S) W i e S. (31) 



* Often the nearest neighbours of a pixel constitute its neighbourhood; for example the four pixels 
situated to the left, right, above and below the pixel i form the neighbourhood of i; this definition of 
neighbourhood is employed in the square lattice models of statistical mechanics. In image processing, 
the eight pixels - four nearest neighbours and four next-nearest neighbours, are taken as neighbourhood 
in several applications, like for example low pass filter. Even 24 pixels surrounding a central pixel are 
also often considered as neighbourhood, in filter algorithms. 
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Iterating alternately, we get, 

7r(e) = n 7r(^,|{0, :jGz/j). (32) 

ies 



Let us define, 

U,{&) = -\ogn{e,\{e, : 3 e u,}'^, (33) 



and 



Then, 



7r(0) = iexp[-f/(e)], (35) 

where Z is the normalization constant. This expression for the Prior is identical to 
the Gibbs distribution if we identify U{Q) = PpE{Q) and Z as the Z{Pp), defined in 
Eq. (|28p . Note that E{Q) is a sum of the interaction energies of the states of nearest 
neighbour pixels and hence is consistent with the Markov random field requirement. It is 
important that the interaction must be restricted to a finite range for to be a Markov 
random field. Since we consider in this review only nearest neighbour interactions, this 
condition is automatically satisfied. 

The equivalence of Gibbs distribution in statistical mechanics and Markov random 
field in spatial statistics was established, see [12,13] following the work of Hammersley 
and Clifford [14]. 



Ising Model for a Priori Distribution 

Ising model [15] is the simplest and perhaps the most studied of models in statistical 
mechanics. Let us consider a Prior inspired by Ising model. We consider an image 
having only two gray levels, designated as say ( = 1, 3 or equivalently S = ±1; we 
have the transformation ^ = 5' + 2. The index ( = 1 {S = —1) refers to black and C = 2 
[S = +1) refers to white gray level. The interaction energy of the states of two nearest 
neighbour pixels i and j is given by, 

E„(e) = my^e^). (36) 

It is clear that if the two pixels have the same gray level, the interaction energy is zero. 
If their gray levels are different the interaction energy is unity. In the language of the 
physicists, the ground state of the two-pixel system has zero energy and the excited 
state has unit energy. The two energy levels are separated by AE = 1. The energy of 
an Ising image G is given by^, 

E{e) = = ^ ^^■) • (37) 

^The Ising Hamiltonian in statistical mechanics is given hy E — — j) ^i^j^ where Si — ±1 is 
the spin on the lattice site i and J is the strength of spin-spin interaction, usually set to unity. The 
ground state of the a pair of nearest neighbour spins is of energy — 1 and the excited state is of energy 
+ 1; the energy separation, is thus 2; i.e. AE = 2 
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The Ising Prior can be written as, 



7r(e) 



1 



exp -PpY.^{9ij^9j) 



(38) 



where Z{(3p) is the canonical partition function. Ising Prior in conjunction with Ham- 
ming Likelihood is suitable for restoring binary images. 

Potts Model for a Priori Distribution 

Potts model [16] is an important lattice model in which there are discrete number of 
states at each site. Consider a multiple gray level image. The gray levels are taken as 
{0, 1, 2, ■ ■ ■ ,Q — 1}. We have then a Q state Potts model. The label denotes black, 
and the label Q — 1 denotes white gray levels. The different shades correspond to the 
intermediate Potts numbers. The expressions for the energy associated with the states 
of two nearest neighbour pixels, for the Potts Hamiltonian and for the Potts Prior are 
the same as the ones defined for the Ising model given in the last section^''. Potts Prior 
in conjunction with Hamming or Kullback-Leibler Likelihood is suitable for restoring 
images painted with a small number of gray levels. 

Gemen-McClure Model for a Priori Distribution 

Gemen and McClure [17] recommended a Prior in which the states of two neighbouring 
pixels interact with an energy given by. 



where C is a hyper-parameter that determines the width of the distribution. The value 
of Eij[Q) ranges from a minimum of —1 when \9i — 9j\ = to a maximum of 0, 
when \9i — 9j\ — > oo. Gemen- McClure interaction potential is depicted in Fig. (2) for 
C = 0.1, 1.0, and 10.0. The function Eij{Q) is symmetric about \9i — 9j\ — and 
its width decreases with increase of C. For large C the Gemen-McClure interaction 
reduces to an Ising interaction with ground state energy at —1 and excited state energy 
at 0. The Gemen-McClure Prior is thus given by. 



where Z[l3p) is the partition function that normalizes the Prior. 

^^\a statistical mechanics, the energy associated with a pair of nearest neighbour Potts spins 
— J6si,Sj, where Si = 0, 1,-- - ,Q — 1 are the Potts spin labels and J is the strength of spin-spin 
interaction, usually set to unity. The ground state is of energy zero and the excited state is of energy 
unity; the energy separation is unity, i.e. AE = 1. 



1 



(39) 



l + C(9i-9j)^' 




(40) 
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Figure 2: Gemen-McClure energy Eij versus 16*1 = 6j\ for C = 0.1,1.0,10.0; when 
C increases the width of the function decreases. In the limit C ^ oo the Gemen- 
McClure model reduces to the Ising model with ground state at —1 and excited 
state at 0. 
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Relation between j3p and B 



It is clear from the discussion above that the hyper-parameter (5p will depend on the 
smoothness and features present in 0. The Prior is given by, 



7r(e) = 



exp 



l3pE{Q) 



Z{(5p) 

where Z{Pp) is the partition function, 

z{(3p)^J2^w\-PpE{e) 



(41) 



(42) 



The value of (3p is implicitly given by the relation below. 

d 



log Z{Pp) 



(43) 



Consider for example the Ising/Potts Priors. The energy of is completely determined 
by the number of pairs of nearest neighbour pixels with dis-similar gray levels^^ . In the 
graph theoretic language two nearest neighbour pixels arc separated by an edge. Let 
Ne denote the total number of edges in image plane of ©. If two nearest neighbour 
pixels have the same gray levels then we say the edge separating them is a satisfied 
bond^^. Let B{Q) denote the number of satisfied bonds in the image 0. In other 
words B is the number of nearest neighbour pairs of similar pixels. We see immediately 
that E{Q) — Ne — B. Thus the value of the hyper-parameter Pl can in principle be 
calculated from the values of Ne and S of a given image 0. 



Bayesian a Posteriori Distribution 



The Posterior distribution in the Bayesian methodology is given by the product of the 
Likelihood and the Prior. This is called Bayes' theorem, discussed for e.g. in [18,19]. 
Appendix 1 states Bayes' theorem. 
According to Bayes' theorem. 



n{e\x) 



c{x\e)%{e) 



(44) 



The Bayesian Posterior can be formally written as. 



exp 



7r(0|X) 



-<;/5iF(0,X) + /3pE(0)| 



Ee^xp 



-\(3LF{e,X)^(3pE{Q) 



(45) 



It is convenient to work in terms of intensive quantities. To this end we define E per 
pair of nearest neighbour pixels and F per pixel. Accordingly we divide energy by, A^^;, 

^^Two nearest neighbour pixels having the same gray level are called similar pixels. If the gray levels 
are different they are called dissimilar pixels 
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the number of nearest neighbour pairs of pixels in the image plane. For an L x L pixel 
image, we have A^^; = 2L{L — 1) ^ 2L^(for large L). We divide F{&,X) by L^. In 
the expression for the Posterior above, the argument of the exponential function can 
be viewed as a weighted sum of F{Q,X) and E{<d). The weight attached to E is /3p 
and the weight attached to the F is Pl- The relevant quantity is the relative weights 
attached to F and E. Hence we set = 1 and (3p — P, where (3 is the value of 
(3p measured in units of Pl- Also, it is convenient to work with normalized weights 
and accordingly we divide the whole exponent by 1 + /3. The final expression for the 
Posterior that we use in the image restoration algorithm is given by 



The advantage of the above expression is there is only one parameter P = 1/T that 
needs to be tuned for good image restoration^^. 



The aim of image restoration is to construct the true image © from the given noisy 
image X. The simplest estimate of O is the image which maximizes the Posterior. 

For a given temperature, the Posterior increases when F{Q,X) decreases. In other 
words closer is to X, larger is the value of 7r(0|X). Though X is corrupt, it is the 
best we have and we would like to retain during image-restoration process as many 
features of X as possible. 

The Posterior increases when E decreases for a given temperature T. In other 
words smoother is, larger is the value of 7r(0|X). Thus, there is a Prior- Likelihood 
competition, the Prior trying to smoothen the image and the Likelihood trying to keep 
the image as close to X as possible. In other words the Prior tries to make all the pixels 
acquire the same grave level and the Likelihood tries to bind the image to the data X. 
As a result we get an image which has in it the features of X (because of competition 
from the Likelihood) and which is smooth with out noise (because of competition from 
the PriorY'^. The nature of Prior- Likelihood competition is tuned by Temperature, see 
below. 

What happens when the temperature T is large ? 

Temperature determines the relative competitiveness of the Likelihood and Prior toward 
increasing the Posterior. For a given T, 

^Ht is purely for convenience that we have reduced the number of hyper-parameters from two to 
one. Thus we have only one hyper-parameter denoted by /3 = 1/T. We call T the temperature. For 
many applications we would require two or more hyper-parameters which need to be optimized for 
good image restoration. Pyrce and Bruce [6], for example, have considered both Pl and 0p separately 
and carried out Monte Carlo search in the two dimensional hypor-paramctcr space for good image 
restoration. In fact they talk of first order transition line that separates the Prior dominated and the 
Likelihood dominated phases in the context of Ising spin model of image restoration. 

^*We can say that this kind of Prior- Likelihood competition is analogous to the entropy-energy 
competition which determines the phase of a material. 





(46) 



Ee exp - L-2(l + /?)-i{f(0, X) + 2-i/?£;(0)} 



Bayesian Maximum a Posteriori (MAP) 
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- the relative weight attached to F(e, X) is T/{1 + T), and 

- the relative weight attached to £^ is 1/(1 + T). 

When T is large, feature - retention {i.e. the Likelihood) is given more importance. 
However if T is very large, the image restoration algorithm would interpret even the 
noise in X as a feature and retain it; as a result the restored image would be noisy. 

What happens when the temperature T is small ? 

When T is small, smoothcning {i.e. the Prior) is given more importance. However if 
T is too small, there is a danger: the image restoration algorithm would misinterpret 
even genuine inhomogeneities and fine features of X as noise and eliminate them. 

Best image restoration is expected over an intermediate range of T, often obtained 
by trial and error. The problem of image restoration is thus reduced to a problem 
of sampling, independently and randomly with equal probabilities, a large number of 
images from the state space Q. These images constitute a microcanonical ensemble. 
Calculate the Posterior of each member of the microcanonical ensemble. Pick up the 
image which gives the highest value of the Posterior and call it ©map; the suffix MAP 
stands for the Maximum a Posteriori. Gmap is called an MAP estimate of ©. We can 
also employ other statistics, defined over the state space Q, to estimate ©, see below. 



Maximum a Posteriori Marginal (MPM) 

We partition the state space fl into mutually exclusive and exhaustive subsets as de- 
scribed below. Consider a pixel i E S. Define Q^'^ as a subset of images for which the 
gray level of the pixel i is C, where C ^ {0; 1; • ■ - Q — 1}. 

nf = I© e Q, OiiO) = c| for C = 0, 1, • • • Q - 1 (47) 
Calculate now marginal Posteriors, 

7r«= ^(^1^) for C = 0,l,-..g-l . (48) 



e f^(*' 



Thus we get an array of Q marginals ^t:^^^ : ( = 0,1, ■ ■ ■ Q — 1^ . We define 

Cmpm = argm<« Trf (49) 

which stands for the value of C that maximizes the function 7r^*^ In other words, Cmpm 
denotes the value of the gray level for which the marginal Posterior is maximum. Repeat 
the above for all the pixels in the image plane, and obtain a collection of numbers that 
represents an approximation to ©: 



© 



MPM 



©MPM is called a Maximum a Posteriori (MPM) estimate of ©. 
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Threshold a Posteriori Mean (TPM) 

Calculate an average image, Q = {9i : i & S}, where, 



We define. 



^. = ^^.(e) 7r(e|x) V i e s. (51) 

Gen 



CSm = argmin [c-5]e,(e)7r(e|x]' (52) 



which stands for the value of ( that minimizes [( — ^q^q 6*1(8)]^. In other words Ctpm 
is the value of C closest to 9i. Carry out the above exercise for all the pixels in the 
image, and get a collection of gray levels that represents an approximation to ©: 



e 



TPM 



^[C^l^-.tes]. (53) 



©TPM is called Threshold a Posteriori Mean (TPM) estimate of 0. It is easily seen that 
for a binary image 0tpm = ©mpm- 



Elements of Digital Image Restoration 

A typical image restoration process is depicted in Fig. (3). We represent the unknown 
true image by ©, which gets corrupted to X. The degradation of © to X is modeled 
by the Likelihood C{X\Q). Bayes theorem helps synthesize the Likelihood (available in 
the form of a degradation model and data on the corrupt image X) with a Prior (that 
models your subjective beliefs about true image). The result is a Posterior 7r(0|X). 
An ensemble of images having the Posterior distribution is shown as in Fig. (3). 
From this Posterior ensemble we can make MAP, MPM and TPM estimates of the 
true image. 



Algorithm for Calculating 0map 

A straight forward procedure to calculate 0map is to attach a Posterior probability 
to each image belonging to the state space fl and pick up that image which has a 
maximum Posterior probability. This is easily said than done. Note that the number 
of images belonging to state space Q is Q^. Let us consider a small, say 10 x 10 binary 
image (Q = 2). The number of pixels is thus 100. Then Q = 2^°° 10^°. Calculating 
the Posteriors for these 10'^° images is an impossible task even on the present day 
high speed computers^^. A simple procedure would be to sample randomly and with 
equal probability a certain large number of images in the neighbourhood of the given 
image X; these images constitute a microcanonical ensemble. Find the image, in the 

^^A typical image is of size between 256 x 256 to 1024 x 1024 with gray levels ranging from 2, for a 
binary image to 256. The state space will contain order of iQ^^'^o^ images. 
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Figure 3: Digital image restoration : Q denotes the true image (first square in 
the top row) which gets corrupted to X (second square in the top row). The 
degradation of 9 to X is modeled by C{X\Q). Bayes theorem helps construct a 
Posterior 7r{Q\X) from the Likelihood and a Prior. A Posterior ensemble of images ( 
set of squares in the right end of top row). MAP, MPM and TPM estimates of 
6 (depicted by the three squares in the second row) are made from the Posterior 
ensemble. See [6] 



microcanonical ensemble, that maximizes the Posterior and recommend it as an MAP 
estimate of O. An algorithm for doing this is described below. 

Fix the temperature. Start with an image ©o = Call this the current image. 
i.e. Qc = Qo 

(1) Calculate TTc = 7r(ec|X). 

(2) Select a pixel, say j, randomly from the image plane. 

(3) Switch the gray level of the pixel j from its current value to a value sampled 
randomly and with equal probability from the gray level labels 

C = {o, 1, ••• , g-i}. 

(4) Calculate n = 7r{Qt\X). 

(5) If TTt > TTc, then accept the trial image and set ©i = otherwise ©i = ©o- 

(6) Take ©i as the current image ©c, 

(7) go to step (1). 

Iterate the whole process several times. A set of N update attempts constitutes an 
iteration. Collect the images at the end of each iteration. Thus we get a sequence of 
images with monotonically non-decreasing Posteriors. Asymptotically we get ©map, 
called an MAP estimate of ©. 



How does Posterior maximization lead to image restoration ? 

A simple explanation of how does a Bayesian Posterior maximization lead to de-noising 
is depicted in Fig. (4) and described below. Consider a local region of the current image 
©c having a pixel m with gray level label 1 and with all its four nearest neighbours 
having gray level label 3. Let us denote by the symbol u the set of five pixels : the pixel 
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Figure 4: Local region of 6c is depicted at Left. The gray level label of the 
central pixel is switched from 1 to 3 and a trial image Qt is constructed. The 
corresponding local region of 6t is depicted at Right. This move leads to local 
smoothening of the image. Let us assume that the sub-image depicted at Left 
belongs to X and that depicted at Right belongs to 6. Then the above move 
increases F by one unit and decreased £' by 4 units. The Posterior increases if 
T < 2. In other words Prior wins over Likelihood if Tl < 2, leading to de-noising. 



m and its four nearest neighbours. Let us say that in 0, the pixels belonging to u have 
all the same gray level 3 and hence is smooth locally. It is the process of degradation 
that has to led to the noise inhomogeneity in the pixel m. We have, 

F{ec,X) = J2^{9i{ec)y^x?j+J2^{9i{ec)y^x?j (54) 

Let us denote the first term in the above sum (over pixels not belonging to v) as Fq. The 
second term is zero since 9i{Qc) — Xi{X) ^ i e u. We get F{Qc: X) — Fq. Similarly, 

E{Qc) = J2^{e,{Qc)^ej{Qc))+J2^{e,{Qc)^ej{Qc)), (55) 

i^m i=m 

where denotes that i and j are nearest neighbour pixels and the sum extends over 
all distinct nearest neighbour pairs of pixels. Let us denote by Eq the first term in the 
above. The second term is 4 since there are four dis-similar nearest neighbour pairs of 
pixels in the sub-image u. Hence E{Qc) = Eq + 4. In the simulation, we change the 
gray level of pixel m from it present value of 1 to 3 and call the resulting image as G^. 
We find that F{Qt,X) = Fq + I and E{Qt) = Eq. Eventhough the move increases F by 
one unit, it decreases E by four units. Let the Posteriors of and be denoted by 
TTc and TTt respectively. The ratio of the Posteriors can be calculated and is given by. 



— = exp 



L2(l + /3) 



(1-2/5) 



(56) 



which is greater than unity whenever j3 > (1/2) or T < 2. Since we accept Qt only 
when TTf is greater than ttc, de- noising takes place when T < 2. 

It is clear from the discussion above, that removal of an inhomogeneity entails 
energy reduction. Hence a Prior tries to remove all inhomogeneities in the image plane 
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including the gcmiinc ones. On the other hand, the Likelihood binds the image to the 
data X. It tries to retain all the features of X including the inhomogeneities that have 
their origin in noise. It is the temperature that tunes the competition between the 
Prior and the Likelihood. 

Restoration of Toy Images 
Binary Robot 

We have created a 92x92 binary image 6, referred to as ROBOT, depicted in Fig. (5 : Left). 
To corrupt a binary image with noise we proceed as follows. We select a pixel i and 
change the gray level label from its present value to the other with a probability 0.05: 
Call a random number if ^ < 0.05, change the gray level. Otherwise do not change 
the gray level. Carry out this exercise on all the pixels independently. This is equivalent 
to adding 5% noise to the image; there is no spatial correlations in the noise added. 
The resulting corrupt image, called X is depicted in Fig. (5 : Middle). Employing Ising 
Prior and Hamming Likelihood we have calculated an MAP estimate of © at T = 0.51 
depicted in Fig. (5 : Right). 

Five-gray Level Robot 

We have constructed a 5 gray level ROBOT image on a 56 x 56 image frame depicted 
in Fig. (6 : Left). We corrupt the image with 5% noise and get X depicted in Fig. 
(6 : Middle). Employing Potts Prior and Hamming Likelihood we have made an MAP 
estimate of the true image which is depicted in Fig. (6 : Right). The image restoration 
has been carried out at T = 0.51. We monitored the Bayesian Posterior maximum at 
the end of each iteration. The Posterior maximum is a monotonically non-decreasing of 
the iteration index. Eventually the Posterior maximum saturates as shown in Fig. (7). 
MAP estimates of the 5-gray level Potts image made at a high temperature (T = 2.5) 



True Image 5 % T = 0.5 




Figure 5: Binary ROBOT image; L = 92. (Left) true image (Middle) Image X 
constructed by adding noise to Q, as described in the section 'Binary Symmetric 
Channel Likelihood distribution'with p = 0.05 and (Right) Restored image, 0map- 
Complete restoration happens after one iteration. Image restoration has been 
carried out at T = 0.5 

and at a low temperature 0.01 and at intermediate temperature {T — 0.51) are shown 



22 



Markov Chain Monte Carlo Image Restoration 



in Fig. (8); the high temperature ©map is noisy and the low temperature ©map has 
several of its fine features distorted. Good image restoration obtains for temperatures 
in the neighbourhood of 0.5. 

Restoration of Benchmark Images 

Binary Lena Image 

We have taken a benchmark image called the Lena, painted with 256 gray levels. We 
have converted it to a binary image and is displayed in Fig. (9 : Left). We introduce 5% 
noise and the resulting image X is also shown in Fig. (9 : Middle). We have employed 
Ising Prior and Hamming distance and made an MAP estimate of the original image. 
The restored image ©map is depicted in Fig. (9 Right). 

We have processed the same image at temperatures T = 0.1, 1.1, 2.5 and the results 
are displayed in Fig. (10). At T = 0.1, the algorithm has removed all the fine features 
of the image. At T = 2.5 the algorithm has interpreted even noise as features and 
retained them. Good image restoration seems to happen at temperatures between 1 
and 1.5 

Five and Ten Gray Level Lena Images 

We have coarse grained the same Lena image to 5 gray levels and the result is shown in 
Fig. (11 : Left). The image corrupted with 5% noise is depicted in Fig. (11 : Middle) 
and an MAP estimate made with Potts Prior and Hamming distance is depicted in Fig. 
(11 : Right). 

Results of image restoration of Lena image with 10 gray levels are shown in Fig. 
(12). It is clear from the discussions above such a search algorithm would be time 
consuming. It would be advantageous to obtain an ensemble of images which has the 
desired Posterior distribution so that the required statistics can be calculated by simple 
arithmetic averaging over the Posterior ensemble. 
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True Image 5% T = 0.51 

*. '^^^L^. ■' ^^^^^ 

Figure 6: ROBOT image: L=56; 5 gray levels; Image restoration has been carried 
out employing Potts Prior, Hamming Likelihood. (Left) True image (middle) 
Image X constructed by adding noise to Q, as described in the section 'Degra- 
dation of a multi-gray level image' with p = .05, and (right) restored image 0map 
at T = 0.51. 



0.965 



0.96 




921 1 1 1 1 1 1 1 

5 10 15 20 25 30 35 

ITERATION INDEX 

Figure 7: Posterior maximum versus iteration index, for image restoration shown 
in Fig. (6) 
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T = 0.01 T = 0.51 T = 2.5 

Figure 8: Robot image: L = 56; 5 gray levels. Image restoration employs Potts 
Prior and Hamming Likelihood. The Maximum a Posteriori (MAP) estimates at 
low : T = .01 (Left), intermediate T = 0.51 (middle) and high : T = 2.5 (right) 
temperatures are depicted. 



True Image 5% T = 1 .5 




Figure 9: Restoration of the benchmark binary Lena image, employing Ising Prior 
and Hamming Posterior. L = 141 (Left) True image Q (Middle) Image X con- 
structed by adding noise to as described in section 'Binary Symmetric Chan- 
nel Likelihood distribution' with p = .05. (Right) Restored image, 0map after one 
iteration, image processing has been carried out at temperature T = 1.5 
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T = 0.1 T = 1.1 T = 2.5 




Figure 10: Restoration of binary Lena image at three temperatures. MAP esti- 
mates of e are given. (Left) T = 0.1 (Middle) T = 1.1 and (Right) T = 2.5. 6 and 
X are the same ones depicted in Fig. (9). 



True Image 5% T = 1.0 




Figure 11: Restoration of the benchmark 5-gray level Lena image employing Potts 
Prior and Hamming distance. (Left) True image (Middle) Image X constructed 
by adding noise to Q as described in section 'Degradation of a multi-gray level 
image' with p = 0.05. The image restoration was done at T=1.0 

It is precisely in this context that the Metropolis algorithm [20], discovered for pur- 
pose of obtaining a canonical ensemble of microstates in statistical mechanics becomes 
useful. We outline below a Markov chain Monte Carlo technique in conjunction with 
the Metropolis algorithm and its variant, to obtain a Posterior ensemble of images. 

Markov Chain Monte Carlo for Sampling from a Pos- 
teriori Distribution 

Monte Carlo is a numerical technique that makes use of random numbers to solve a 
problem. Historically, the first large scale Monte Carlo work carried out dates back 
to the middle of twentieth century. This work pertained to simulation of neutron 
multiplication, scattering, streaming and eventual absorption in a medium or escape 
from it. Application of Monte Carlo method to problems in statistical mechanics started 
with the discovery of the Metropohs algorithm [20] which generates a Markov chain of 
microstates converging to the desired ensemble. There exist a vast body of hterature on 
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True image 



Image with noise 5% 



'° Clean image at T=1 




Figure 12: Restoration of the benchmark 10-gray level Lena image employing Potts 
Prior and Hamming distance. (Left) True image 6 (Middle) Image X constructed 
by adding noise to G as described in section 'Degradation of a multi-gray level 
image' with p = 0.05. The image restoration was done at T=1.0 

Monte Carlo technique. We refer to [22-29] for some of them. Monte Carlo constitutes 
a natural numerical technique for image processing. We describe below a Markov Chain 
Monte Carlo algorithm for image restoration. 

We fix the temperature T. We start with an arbitrary image ©o- A good choice 
of ©0 is Then we construct a Markov chain whose asymptotic segment contains 
images belonging to the desired Posterior ensemble at the chosen temperature. In a 
Markov chain ©o ^ ©i — > ■ • ■ ^ ©„ the image ©fc+i depends only on ©^ 

and not on the previous history. In Appendix 2 we have described briefly, the basic 
elements of a Markov chain and its construction by Monte Carlo algorithms. For a 
detailed description of Markov chain see [31]. Here we describe only the operational 
details of a Markov chain Monte Carlo algorithm. 

First calculate the Posterior, 7r(©fc|X) (upto a normalization constant) of the current 
image Qk', denote it by the symbol tt^. Select randomly a pixel from the image plane. 
Change its gray level randomly^^, and get a trial image Qf Calculate the Posterior of 
the trial image, 7r(©t|X) and denote it by ttj. Accept the trial image with a probability 
p given by. 



This is called the Metropolis algorithm [20]. This is also known as Metropolis-Hastings 
algorithm in image processing literature^^. We notice that the Metropolis acceptance 
is based on the ratios of the Posteriors. The normalization constants cancel. Hence it 
is adequate if we know the Posterior upto a normalization constant. 

We can also employ Gibbs' sampler^^ in which the acceptance probability is given 

^^For example if we are processing a binary image, then switch the gray level of the chosen pixel 
from its present value to the other; if we are processing an image with Q gray levels, then change the 
gray level of the chosen pixel to one of the Q values randomly. 

-'^ "^Metropolis algorithm is a special case of a more general Hastings algorithm [30] . 

^^The Gibbs sampler was discovered in statistical physics by Cruetz [32] and is known by the name 
heat-bath algorithm; it has also been discovered independently in the context of spatial statistics by 
Ripley [33], Grenader [34] and Gemen and Gemen [2]. The name Gibbs sampler is due to Gemen and 




(57) 
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by 



P=^^. (58) 



The acceptance/rejection step is implemented as follows. We call a random number ^, 
distributed uniformly in the range zero to unity. If ^ < p, accept the trial image and 
set ©fe+i = 6*. Otherwise reject the trial image and set Qk+i — ©fc- Repeat the above 
on the image 6fe+i. Iterate and construct a Markov chain of images given by 

Oo ^ Oi ^ 02 ^ . 0„ ^ 0„+i ^ • • • (59) 

In practice, we define a consecutive set of N attempted updates (successful or otherwise) 
as constituting a Monte Carlo Sweep (MCS), where N is the total number of pixels in 
the image being processed. We take the image at the end of successive MCS and 
construct a Markov chain. We have shown in Appendix 2 that the asymptotic part of 
the chain {Om : m > n ^ oo} contains images that belong the desired Posterior 
ensemble. Let F denote the set of images taken from the asymptotic segment of the 
Markov chain. F is called the Posterior ensemble. Let F denote the total number of 
images in F. MAP, MPM and TPM estimates of O can be made from the Posterior 
ensemble F as follows. 



OmAP = arg max 7t{Q\X) (60) 

For calculating MPM estimate, we partition F into mutually exclusive and exhaustive 
subsets of images as described below. Consider a pixel i E S. Define F^*^ as a subset of 
images for which the gray level of pixel i is C, where ( — 0,1 ■ ■ ■ Q — 1: 

rf = {o e F, ^,(0) = c} for c = 0, 1, • • ■ , g - 1 (61) 

Let F^*^ denote the number of images belonging to the subset F^'^ Then we have, 

CSpm = arg ma. r« (62) 

(63) 

In other words, Cmpm ^^'^ value of ( for which F^*'' is maximum, while Ctpm 
value of C, closest to 6i - the arithmetic average of the gray level of pixel i taken over 
images belonging to the Posterior ensemble F. We depict in Fig. (13) MAP, MPM and 
TPM estimates for a binary ROBOT image with L = 56. 

We have not presented in this review detailed results on the MPM and TPM esti- 
mates of the true image, because we find that these estimates arc not good if we have 
only a single hyper-parameter in the image restoration algorithm We find that retain- 
ing two independent hyper-parameters one from the Likelihood model of degradation 

Gemen [2]. The Glauber algorithm [35] is the same as heat-bath algorithm, but for a minor irrelevant 
detail [29]. 



STPM 



arg min 
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MAP estimate MPM estimate tpM estimate 




Figure 13: Image restoration employing Ising Prior, Hamming Likelihood and 
Markov Chain Monte Carlo with Metropolis acceptance. MAP (Left), MPM 
(middle) and TPM (Right) estimates made at T = 0.5 are shown. These statis- 
tics have been obtained from the ensemble of images sampled employing Markov 
chain Monte Carlo in conjunction with the Metropolis algorithm. The Gmap ob- 
tained from an earlier Monte Carlo search algorithm was taken as input to avoid 
removing initial images from the Markov chain for purpose of equilibration. All 
the three estimates give identical results. In any case for binary image one can 
show that by definition MPM and TPM estimates are the same 



{(3l) and the other from the Prior model of out subjective expectation (/5p) would be 
required for a meaningful estimate of through MPM and TPM statistics employing 
Markov chain Monte Carlo technique in conjunction with Metropolis algorithm or Gibbs 
sampler. We need to consider images that are consistent and that are not consistent 
with the Likelihood and Prior model assumptions and investigate the performance of 
Markov chain Monte Carlo restoration algorithms. Work in this direction is in progress 
and would be communicated soon. 

In the Metropolis or the Heat-bath algorithms described above, only a single pixel 
is updated at a time. For restoring a large image, single pixel update algorithms can 
be frustratingly time consuming. Also if there arc strong correlations in an image, 
which often is the case, we could think of updating the states (gray levels) of a large 
cluster of pixels, to speed up the algorithm. If we take an arbitrary cluster of pixels of 
the same gray level and update their gray levels coherently, most often the trial image 
constructed would get rejected in the Metropolis step. Hence it important to have a 
correct definition of a cluster. It is in this context a study of the cluster algorithm 
proposed by Swendsen and Wang [36] becomes relevant. 



Swendsen-Wang Cluster Algorithm 

Swendsen and Wang [36] derived an ingenious cluster algorithm for mapping the Ising/Potts 
spin problem to a bond-percolation problem based on the work of Kastcleyn and For- 
tuin [37] and Coniglio and Klein [38]. The algorithm was originally intended for over- 
coming the problem of critical slowing down near second-order phase transition. This 
algorithm has been adapted to several problems in image processing since recent times. 
A comparison of Swendsen-Wang algorithm and Gibbs sampler was reported in [39]. 
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The dynamics of Swcndsen-Wang algorithm in image segmentation has been investi- 
gated in [40]. Swcndsen-Wang algorithm has found applications in Bayesian variable 
selection particularly for problems where there are multi-colinearities amongst predic- 
tors [41]. Very recently a version of Swendsen-Wang algorithm for removing Poisson 
noise from medical images was proposed [42]. In fact it has been recognized that the 
Swendsen-Wang algorithm is a particular example of a more general auxiliary variable 
method pioneered by Edwards and Sokal [43] and Besag and Green [44]. For a review 
of auxiliary variable method see [45]. 



Auxiliary Bond Variables 

Let us first consider sampling of images from the Ising/Potts Prior. Introduce a collec- 
tion e, of binary random variables : e = {e^j i,j & S} where i,j are nearest neighbour 
pixels. These are called auxiliary bond random variables, e^j = or 1. If 6i = 9j then 
we say there exists a satisfied bond between i and j. A satisfied bond may be occupied 
(ejj = 1) or may not he occupied {tij = 0). If 6'^ 7^ 9j then there is no bond connecting 
the pixels i and j. In other words €ij — 0, if 9i 7^ 9j. Let p denote the probability 
of occupying a satisfied bond in 0. The probability of not occupying a satisfied bond 
is q = 1 — p, so that p + q = 1. We do not yet know the value of p we must use for 
obtaining an ensemble of images distributed as per the Ising/Potts Prior. We shall take 
up this question after going through the following preliminaries. 



Conditional, Joint and Marginal Priors 

Construction of a bond structure on the given image, described above, can be expressed 
mathematically as. 





= = Oj) 


= P: 




= 0\0^ = 9j) 


= Q, 




^l\9iy^9,) 


= 0, 




= o\ei ^ 9,) 


= 1. 



(64) 

The conditional Prior in terms of p and q is given by. 



{i,3) 

The joint Prior is given by. 



-(e,e) = i-n 



(65) 



q X(e,j = 0)+p J(e,j = 1) I{9i = 9j) 



(66) 
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where Z\ is the normahzation constant. From the joint Prior 7r(0, e), we can derive an 
expression for the marginal Prior 7i{6) and is given by 



vr 



(6) = 5^vr(e,e) 

e 

= ^ n [p^(^^ = + { 



(67) 



What is the value of p appropriate for sampling from the Ising/Potts Prior 
? 



Let us start with the Ising/Potts Prior given by Eq. (|38|) and express it in a different 
form as described below. 

— exp[-/3Vx(0,^^^,) 



7r(e) 



(iJ) 



Z{(3) 



Z{(3) 



X e 



n 



l-e-^)j(^, = %)+exp(-/3) 



(6^ 



Comparing the above with the marginal Prior T{{Q\e) given by Eq. ()67j) we can conclude 
that p = 1 — exp(— /5), q = exp(— /?) and the Ising/Potts partition function Z{i3) is the 
same as the normalization constant of the joint Prior Z\^ . 

Another way of looking at the same thing, which gives a better insight into the 
Swendsen-Wang algorithm, is as follows. Let Ne denote the total number of distinct 
nearest neighbour pairs of pixels in the image plane. If a pixel is considered as a vertex, 
then Ne is the number of edges in the graph. Let B denote the total number of like 
pairs of pixels in an image 6; i.e. B is the total number of satisfied bonds in 6. The 
Ising/Potts Prior can be written in terms of Ne and B as follows. Consider Eq. ()38p . 
The summation in the exponent of the Ising/Potts Prior can be split into a sum over 
like pairs and a sum over unlike pairs. 



(id) 



Ne~B 



(id) 



(69) 



^^For the Ising model in statistical mechanics, p = 1 — exp(— 2/3), since the ground state and the 
excited state of a pair of nearest neighbour Ising spins differ by two units of energy, i.e. AE = 2. 
However, for Potts spin model p = 1 — exp(— since AE 
models considered here AE = 1 and hence p = 1 — exp(— 



1. In both the Ising and Potts Prior 
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Thus we get, 

vr(e) = J^exp[-/?(iV^-i?)] (70) 

Let b denote the number of occupied bonds in the image B. Note that 6 is a random 
variable. For an image B, the value of b can range from to B. It is easily seen that 
the joint Prior 7r(B, e) can be expressed in terms of A'"^; and b as, 

vr(B,6) = ^g^--y (71) 

Integrate vr(B, e) over e (keeping in mind that we need to sum over the distribution of 
b) and get the marginal distribution of B in terms of Ne and B: 



B\ 



Z, ^ b\ {B-b)\ 



gNE~b pb 



— g^^-^ V g^-" 



= ^^"^^""^ (72) 
^1 

The above is the same as the Ising/Potts Prior given by Eq. (jTOJ, if we identify that 
Zi = Z{P), q = exp(— /3) and hence p = 1 — exp(— /3). We also could have come to the 
same conclusion directly from the marginal Prior given by Eq. ()67|) , 

vr(B) = ^{p + qf g^--^ = ^ g^--^. (73) 



Conditional a Priori Distribution 7r(0|e) 

For practical implementation of the algorithm, we need conditional Priors: 7r(e|B) and 
7r(B|e). The simulation strategy is to sample alternately from these two conditional 
Priors and construct a Markov chain of images which asymptotically converges to the 
Prior ensemble. We already have an expression for 7r(e|B) given by Eq. (|65|). with 
q = exp(— /5) and p = 1 — exp(— /5). 

To derive an expression for the conditional Prior 7r(B|e), we proceed as follows. We 
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have, 



7r(e|6) 



7r(6) 



Iliij) Q = 0)+p I{e,j = 1)I{9, = 9j) 



Ci j — — 1 



1 



(74) 



QNcie) 



where Nc is the number of clusters generated by imposing e bonds on the image. Es- 
sentially we have converted an interacting system into a non interacting cluster system. 
Each cluster acts independently. We switch the gray level of each cluster of pixels 
randomly, coherently and independently. 

Sampling e from 7r(e|6fc) 

Let Gfc denote the current image in the Markov chain. Take a pixel i; this constitutes 
a single-pixel cluster. Let z/j denote the set of pixels that are nearest neighbours of i. 
Take a pixel j e Ui. If Oj{Qk) 7^ Oi{Qk) do not put a bond between them. On the other 
hand if Oj{Qk) — we say there exists a satisfied bond between the two pixels. 

Then call a random number ^. If < p = exp(— /5) then put a bond between them. 
In other words we occupy the satisfied bond with a probability p. If the decision is to 
occupy the satisfied bond then add the pixel to the cluster and form a two-pixel cluster. 
Repeat the above on all the nearest neighbours of i and on all the nearest neighbours 
of pixels that get added to the cluster. The cluster growth will eventually terminate. 
Thus we get a cluster grown from the seed pixel i and let us denote the cluster by the 
symbol Ci. . Start with a new pixel that docs not belong to Ci. Grow a cluster in the 
region of the image plane excluding the one occupied by ci. Call this cluster C2. The 
process of growing cluster from new seeds would eventually stop when all the pixels 
in the image plane have been assigned with one cluster label or other. Thus we have 
non-overlapping and exhaustive set of clusters on the image plane. 

Sampling from 7r(9|e) 

Change the gray levels of all pixels in a cluster to a gray level randomly chosen from 
amongst the Q gray levels. Carry out this process on each cluster independently. Re- 
move the bonds in the resulting image and call it ©fe+i- 

Construction of Prior Ensemble 

Start with ©0 — ^- Form clusters on the image plane, by sampling e from 7r(e|©fe) 
as described above. Get a new image ©1 by sampling from 7r(©|e) as described above. 
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Iterate and get a Markov chain of images. The asymptotic part of the Markov chain 
would contain images belonging to the Prior ensemble. 

Next, we turn our attention to adapting the Swendsen-Wang algorithm to image 
restoration wherein we need to sample images from the Posterior and not from the 
Prior. The key point is that having formed clusters on the image plane of we can get 
©fe+i by sampling the cluster gray levels from the Likelihood distribution independently. 
The whole procedure is described below. 



Posterior Ensemble: Swendsen-Wang Algorithm 

Start with an arbitrary image Gq. A good choice is 0o = X. Sample bond variables e 
from the conditional Prior 7r(e|6o) as described earher. We get clusters of pixels. 

Consider one of the clusters, say the A;-th cluster. Let rjk denote the pixels belonging 
to the cluster. Sample a gray level for this cluster from the Likelihood distribution. 



PC 



exp 













(75) 



Sampling from the discrete distribution {p^ : C = 0) 1) Q — ^} is carried out as 
follows. Calculate the cumulative distribution 



Pn = 



k 

m=l 
1 



(76) 



Let ^ be a random number uniformly distributed between and 1. The gray level k 
for which -Pfc < ^ < Pk+i is assigned to the pixels in the chosen cluster. Repeat the 
exercise for all the clusters independently. Remove the bonds. The resulting image ©i 
is the image for the next graph construction. Iterating we get a Markov chain. 



©0 ^ (©o,eo) ^ (©i,eo) ^ ©i ^ (©i,ei) 

©n (©n,en) (©n+1, £„) — > ©n+1 •• 



(02, ei) 



which converges asymptotically to the to the Posterior ensemble. We can make an 
MAP, an MPM or a TPM estimate the true image from the Posterior ensemble. 



Posterior Ensemble: Wolff's Algorithm 

Wolff [46] proposed a simple modification to the Swendsen-Wang cluster algorithm. 
Wolff's cluster algorithm can be adapted to image restoration as described below. 
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A single cluster is grown from a randomly chosen pixel employing Swendsen-Wang 
prescription. The gray level of all the pixels in that cluster is updated by sampling 
from the Likelihood distribution. This results in a new image that constitutes the next 
entry in the Markov chain. A pixel is chosen randomly in the new image and the whole 
process is repeated. A Markov chain of images is constructed. The asymptotic part of 
the Markov chain would contain images that belong to the desired Posterior ensemble. 
An estimate of can be made from the Posterior ensemble employing MAP or MPM 
or TPM statistics. 

We have employed Wolff's cluster algorithm in restoration of a binary and a 5-gray 
level ROBOT image, at T = 0.51. The results are depicted in Figures (14) and (15). 



True Image Image with 5% noise Clean Image at T=0.5 




Figure 14: Image restoration employing Ising Prior, Hamming Likelihood and Wolff 
cluster algorithm for estimating Omap? for a binary ROBOT image. (Left) O 
(middle) X (Right) MAP estimate, image processing has been carried out at 
temperature T = 0.5 



True Image Image with noise 5% Clean Image at T=0.51 




■ J- 



Figure 15: Image restoration employing Potts (5 gray levels) Prior, Hamming 
Likelihood and Wolff cluster algorithm for estimating Smap? for a ROBOT image. 
(Left) (middle) X (Right) MAP estimate, image processing has been carried 
out at temperature T = 0.5 
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Discussions 

We have presented a brief review of Markov chain Monte Carlo methods for restora- 
tion of digital images. There are three stages. The first consists of modehng of the 
degradation process by a conditional distribution called the Likelihood. The data on 
the given corrupt image is incorporated in the Likelihood. The second stage consists of 
constructing a priori distribution. Our expectations of how a true image should look 
like are modeled in the Prior. Bayes theorem combines the likelihood and the Prior 
into a Posterior. The third stage consists of sampling images from the Posterior distri- 
bution, and making a statistical estimate of the true image. This is where Monte Carlo 
methods come in. 

We have described Likelihoods based on Poisson model, Kullback-Leibler entropy 
distance, Hamming distance, binary symmetric channel and Gaussian channel. The 
Prior model discussed include Ising and Potts spin. We have given a simple explanation 
of how does a Posterior maximization lead to image restoration and described a Monte 
Carlo algorithm that does this. 

Metropolis algorithm and Gibbs sampler are elegant techniques for sampling images 
from a Posterior distribution. The strategy is to start with a guess image and generate 
a Markov chain of images. We take a time homogeneous Markov transition matrix 
that obeys detailed balance with respect to the desired Posterior distribution. Detailed 
balance ensures asymptotic convergence of the Markov chain to the Posterior ensemble. 
Both Metropolis and Gibbs sampler obey detailed balance condition. The topics of 
general Markov chain, time homogeneous Markov chain, Markov transition matrix, 
balance equations, balance and detailed balance conditions, asymptotic convergence, 
time reversal of Markov chain, etc were discussed in the Appendix 2. In the main text, 
the steps involved in the implementation of Markov chain Monte Carlo algorithm were 
presented. 

Then wc took up the issue of cluster algorithms often employed in the context of 
second order phase transition. Cluster algorithms give you images that belong to the 
Prior ensemble. To generate a Posterior ensemble, we sample the gray level of all the 
pixels in a cluster randomly and independently from the Likelihood distribution. We 
have described Swcndscn-Wang and Wolff cluster algorithms. 

We can say that the techniques of simulating a canonical ensemble in statistical 
mechanics including cluster algorithms have been adapted to stochastic image restora- 
tion, with a reasonably good measure of success. Recently, in statistical mechanics, 
non-Boltzmann ensembles like multicanonical or entropic ensembles [47-49], have be- 
come popular. An advantage of such non-Boltzmann sampling techniques is that from 
a single simulation we can get information over a wide range of temperature by suitable 
reweighting schemes. The microstates that occur rarely in a canonical ensemble would 
occur as often as any other microstates in a multi canonical or entropic ensemble. The 
important point is that the multicanonical and entropic sampling Monte Carlo were 
intended for overcoming the problem of super-critical slowing down in first order tran- 
sition. In the context of image restoration the transition from Prior dominated phase 
to Likelihood dominated phase is first order [6]. Hence we contend that multicanoni- 
cal/entropic samphng would have a potential application in image restoration. Work 
in this direction is in progress and would be reported soon. 

Investigating algorithms based on invasion percolation [50-52] and probability chang- 
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ing clusters [53,54] may also prove useful in image restoration. The reference noisy im- 
age X acts as an inhomogeneous external field. This is analogous to an Ising magnetic 
system in which the transition becomes first order due to presence of external field. 
It would be interesting to develop cluster algorithms that incorporate the presence of 
external fields in statistical mechanical models and adapt them to image processing. 
Also in stochastic image processing, ideas from random field Ising models may prove 
useful. 

We can add to the list further. But then we stop here. We conclude by saying 
that analogy between image processing and statistical mechanics is rich and exploring 
common elements of these two disciplines should help toward developing efficient image 
processing algorithms. 
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Appendix 1 

A Quick Look at Bayes' Theorem 




Figure 16: Illustration of Bayes' theorem 

Consider mutually exclusive and exhaustive events 0, Q^^\ Q^'^\ 
We have, 



and 



enew = ene(2) = eWne(2) 



Let X gQ. We have, 



Prior 
Posterior 
Likelihood 



7r(0) 

7r(0|X) 

C{X\Q) 



£(X|0) = 



7r(e) 



vr(eix) = 

P{X) 

P{X) = C{X\Q)n{Q) + £(X|e«)7r(0^'^) + C{X\&^^)n{&^^) 
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Appendix 2 

A Quick Look at Markov Chains 

Joint and Conditional Distributions 

Consider a system which can be in any of the states denoted by 

{ai, a2 ■ ■ ■}. 

Starting from an initial state bo E A time 0, the system visits in successive time 
steps the states, 

bi^A, 62 e ^, • • • , bn^A, ■■■ . 

The sequence of states bo E A, bi e A, ■ ■ ■ are random. Let P{bm &n-i, • • • ^1, ^0) be 
the joint probability Prom the definition of conditional probability, 



Iterating we get. 





• • , bo) = P{bn\bn-1, ■ 


■■ ,bo) 




X P{bn-U--- , 


bo) 


P{bni bn-1, ■ ■ 


■ , bo) = P{bn\ hn-1, ■ ■ 


bo) 




X P(6„_l| bn-2. 


■■■,bo) 




X P{bn-2\ bn-3. 


bo) 




X • • • 






X Pib2\bi, bo) 






X P{b,\bo) 






X P{bo) 





Markovian Assumption 

The sequence of states 60, bi, • • • , 6„ constitutes a Markov chain, if 

P{bk\bk-i, bk-2, •■■ , bi, bo) = P{bk\bk-i) V /c = 1, n 
As a consequence we have, 

n 

P{bn, bn-1, ••• , 61, bo) = P{bo)l[P{bk\bk-i) 

k=i 

The state of the system at time step k + 1 depends only on its present state (at time 
step k) and not on the states it visited at all the previous time steps, — 1, A; — 2, • • • , 2, 
1, 0. Future is independent of the past once the present is specified. We call P{bk+i\bk) 
the transition probability at time k. In general the transition probability depends on 
k. 
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Stationary Markov Chain 

We specialize to the case when the transition probabihty is independent of the time 
index k. Then we get a homogeneous or stationary Markov chain. We have therefore, 

P{bk+i = ai\bk = ttj) = Mij V k 

Thus a (homogeneous) Markov chain is completely defined once we specify, 

- the state space A = (oi, a2, • • • } 

- the probabilities {Mjj : V i, j} for transition from one state {aj e A) to another 
state (oj e A), and 

- the initial state bo e A 
Markov Transition Matrix 

M, defined above, is called the transition matrix. For every state aj, because of nor- 
malization, we have, 

Y^P{ai\aj) = J2Mij^l 

i i 

In other words, the elements of each column of the transition matrix M add to unity 
A matrix with non-negative elements and for which the elements of each column add 
to unity is called a Markov matrix 

Let {U\ denote the uniform imnormalizcd probability vector (1, 1, ■ ■ ■ , 1). It is 
easily seen that {U\ is the left eigenvector of M corresponding to the eigenvalue unity: 

{U\M^{U\. 

The right eigenvector of M corresponding to eigenvalue unity is called the invariant 
distribution of M, and is denoted by |7r): M\n) = \n); the largest eigenvalue of M is 
real , unity and non-degenerate =^ \n) is unique. 

The eigenvectors of M form a complete set. Let Ittq) be an arbitrary initial vector 
with non-zero overlap with |7r) i.e. (ttItto) ^ 0. We express 

|7ro) = |7r)(7r|7ro) + ^|A)(A|7ro) 

where |A) is the eigenvector corresponding to eigenvalue A 7^ 1 and |A = 1) = |7r) 

M"|7ro) = |7r)(7r|7ro) + 5^A"|A)(A|7ro) 
„~oo k) since iXl^'^Soo 

How do we construct a Markov matrix M whose invariant vector is the desired distri- 
bution Itt) ? 



40 



Markov Chain Monte Carlo Image Restoration 



Balance Condition 

Let P{ai, n) denote the probabihty that the system is in state Oj at discrete time n. In 
other words P{ai, n) is the probability that 6„ = Oj. Formally we have the discrete time 
balance equation 



= E 



MijP{aj,n) - Mj^iP{ai,n) 



+ P{ai,n) 



We need P{ai, n + 1) = P{ai, n) = i when n ^ cx) for asymptotic equilibrium. Wc 
can ensure this by demanding that the sum over j of the terms in the right hand side 
of the master equation be zero with respect to 

In other words, we demand 



= 0. 



3 

This is called a balance condition. The balance condition can be written as 

M\7r) = |7r). 

Detailed Balance Condition 

It is difficult to construct an M whose asymptotic distribution is the desired |7r) em- 
ploying the balance condition. Hence we employ a more restrictive detailed balance 
condition by demanding each term in the sum be zero: 

71 j Mi J = TTiMj^i. 

The physical significance of the detailed balance is that the corresponding equilib- 
rium Markov chain is time reversible. This means that it is impossible to tell whether 
a movie of a sample path of images is being shown forward or backward. 

Time Reversal or tt Dual of M 

Consider a time homogeneous Markove chain denoted by 

6o &i bk-i bk ^ bk+i — > • • • 6jv. 

Let us consider a chain whose entries are in the reverse order and denote it 
6o ^ ^1 bk-i bk ^ bk+i — > • • • 6jv, 

where 



bk 



->N-k- 
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Notice that the forward and the reversed chain have entries belonging to the same state 
space A. The reverse chain is also stationary and Markovian and hence is completely 
specified by the state space A, a Markov transition matrix denoted by M and an initial 
state bo = b^- Aim is to express M in terms of M and |7r). 

To this end, we define a joint probability matrix W and express it in terms of M 
and Itt). We define 

Wij = P{bn+i = ai, bn = aj) V i, j 

Since we are considering stationary Markov chain W is independent of the time index n, 
except for the time ordering . In general one must consider a time reversal of transition 
matrix. Since in an equilibrium system a forward transition and its reverse occur with 
the same probability, we can write formally, 

where M is the time reversal or the 7r-dual of M, given by, 

M = diag ({7r,})Mtdiag 

where diag({?7j}) denotes a diagonal matrix whose (diagonal) elements are {rii} and 
is the transpose of M. The matrix M is Markovian; its invariant vector is the same as 
that of M. If a transition matrix obeys detailed balance then M — M. 

Metropolis Algorithm 

Mij — a min( 1, — ) 
where a is a convenient constant introduced for ensuring normalization: 

Glauber/Heat-Bath/Gibbs Sampler 

It is easily verified that the Metropolis algorithm and the Glauber/Heat-bath/Gibbs 
algorithm obey detailed balance condition; also only the ratio of the probabilities appear 
in these algorithms; It is this property that comes handy in generation of a Markov 
chain for a statistical mechanical system wherein the normalization called the partition 
function is not known. Hence for the Metropolis and Gibbs sampler, it is adequate if 
we know the desired target distribution up to a normalization constant. 
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